9 research outputs found

    How a General-Purpose Commonsense Ontology can Improve Performance of Learning-Based Image Retrieval

    Full text link
    The knowledge representation community has built general-purpose ontologies which contain large amounts of commonsense knowledge over relevant aspects of the world, including useful visual information, e.g.: "a ball is used by a football player", "a tennis player is located at a tennis court". Current state-of-the-art approaches for visual recognition do not exploit these rule-based knowledge sources. Instead, they learn recognition models directly from training examples. In this paper, we study how general-purpose ontologies---specifically, MIT's ConceptNet ontology---can improve the performance of state-of-the-art vision systems. As a testbed, we tackle the problem of sentence-based image retrieval. Our retrieval approach incorporates knowledge from ConceptNet on top of a large pool of object detectors derived from a deep learning technique. In our experiments, we show that ConceptNet can improve performance on a common benchmark dataset. Key to our performance is the use of the ESPGAME dataset to select visually relevant relations from ConceptNet. Consequently, a main conclusion of this work is that general-purpose commonsense ontologies improve performance on visual reasoning tasks when properly filtered to select meaningful visual relations.Comment: Accepted in IJCAI-1

    Interpretable Sequence Classification via Discrete Optimization

    Full text link
    Sequence classification is the task of predicting a class label given a sequence of observations. In many applications such as healthcare monitoring or intrusion detection, early classification is crucial to prompt intervention. In this work, we learn sequence classifiers that favour early classification from an evolving observation trace. While many state-of-the-art sequence classifiers are neural networks, and in particular LSTMs, our classifiers take the form of finite state automata and are learned via discrete optimization. Our automata-based classifiers are interpretable---supporting explanation, counterfactual reasoning, and human-in-the-loop modification---and have strong empirical performance. Experiments over a suite of goal recognition and behaviour classification datasets show our learned automata-based classifiers to have comparable test performance to LSTM-based classifiers, with the added advantage of being interpretable

    Noisy Symbolic Abstractions for Deep RL: A case study with Reward Machines

    Full text link
    Natural and formal languages provide an effective mechanism for humans to specify instructions and reward functions. We investigate how to generate policies via RL when reward functions are specified in a symbolic language captured by Reward Machines, an increasingly popular automaton-inspired structure. We are interested in the case where the mapping of environment state to a symbolic (here, Reward Machine) vocabulary -- commonly known as the labelling function -- is uncertain from the perspective of the agent. We formulate the problem of policy learning in Reward Machines with noisy symbolic abstractions as a special class of POMDP optimization problem, and investigate several methods to address the problem, building on existing and new techniques, the latter focused on predicting Reward Machine state, rather than on grounding of individual symbols. We analyze these methods and evaluate them experimentally under varying degrees of uncertainty in the correct interpretation of the symbolic vocabulary. We verify the strength of our approach and the limitation of existing methods via an empirical investigation on both illustrative, toy domains and partially observable, deep RL domains.Comment: NeurIPS Deep Reinforcement Learning Workshop 202

    Reward Machines

    No full text
    Reinforcement learning involves the study of how to solve sequential decision-making problems using minimal supervision or prior knowledge. In contrast to most methods for automated decision making, reinforcement learning agents do not require access to a formal description of the problem in order to find optimal solutions. Instead, such agents learn optimal behaviour using a trial-and-error strategy. This learning strategy makes reinforcement learning a strong candidate to tackle real-world problems with complex dynamics. However, there are two core challenges that limit the use of reinforcement learning in the real world: sample efficiency and partial observability. In this dissertation, we tackle these two problems through the creation of reward machines. Reward machines are automata-based representations of a reward function that expose reward structures to the agent. We show that agents can exploit these structures to improve their sample efficiency and their performance under partial observability. In particular, we propose (i) to use decomposition methods and reward shaping to improve sample efficiency based on a given reward machine, and (ii) to use reward machines as external memory and solve partially observable tasks by learning reward machines. In both cases, we provide theoretical and empirical evidence of the benefits of utilizing reward machines to tackle these problems. Finally, we conclude this dissertation with a discussion of the role that reward machines can play in tackling other long-standing problems in reinforcement learning, such as developing agents with interpretable behaviours, providing guarantees that an optimal policy is safe, and creating agents that can understand instructions.Ph.D

    Solving Task Scheduling Problems in Dew Computing via Deep Reinforcement Learning

    No full text
    Due to mobile and IoT devices’ ubiquity and their ever-growing processing potential, Dew computing environments have been emerging topics for researchers. These environments allow resource-constrained devices to contribute computing power to others in a local network. One major challenge in these environments is task scheduling: that is, how to distribute jobs across devices available in the network. In this paper, we propose to distribute jobs in Dew environments using artificial intelligence (AI). Specifically, we show that an AI agent, known as Proximal Policy Optimization (PPO), can learn to distribute jobs in a simulated Dew environment better than existing methods—even when tested over job sequences that are five times longer than the sequences used during the training. We found that using our technique, we can gain up to 77% in performance compared with using human-designed heuristics

    Symbolic Plans as High-Level Instructions for Reinforcement Learning

    No full text
    Reinforcement learning (RL) agents seek to maximize the cumulative reward obtained when interacting with their environment. Users define tasks or goals for RL agents by designing specialized reward functions such that maximization aligns with task satisfaction. This work explores the use of high-level symbolic action models as a framework for defining final-state goal tasks and automatically producing their corresponding reward functions. We also show how automated planning can be used to synthesize high-level plans that can guide hierarchical RL (HRL) techniques towards efficiently learning adequate policies. We provide a formal characterization of taskable RL environments and describe sufficient conditions that guarantee we can satisfy various notions of optimality (e.g., minimize total cost, maximize probability of reaching the goal). In addition, we do an empirical evaluation that shows that our approach converges to near-optimal solutions faster than standard RL and HRL methods and that it provides an effective framework for transferring learned skills across multiple tasks in a given environment